Goto

Collaborating Authors

 instruction template


04543a88eae2683133c1acbef5a6bf77-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing Systems

Table 5: All task variations except shape used in VLMbench. The shape variation of each task can be found in the detail descriptions of each task category. Variations Totals Values Color 25 seen:red, maroon, lime, green, blue,navy, yellow, cyan, magenta, silver, gray, olive, purple, teal, azure, violet, rose, black, white unseen: brown, gold, pink, chocolate, coral Size 5 larger, smaller, large, medium, small Relative Position 5 top, front, rear, left, right Level 3 top, middle, bottom Amount 2 fully, slightly Action Type 2 open, close Table 6: All object models used in VLMbench. The number behind the object class indicate the instance number of that class. Here, we list variations used for these tasks in Table. 5. For each demonstration, all things in the scene will change the pose at the beginning. When building an instance-level task with one variation, the other variations will also randomly change. For example, in the demonstrations of "Pick & Place objects" with "size" variation, all objects' color and relative positions, including targets and distractors, will randomly change. In the dataset, we have five types of objects, shown in Table 6. We will explain each task in detail as follows. Visualizations can be found on the project website. A.1 Pick & Place Objects Task Definition: The agent needs to distinguish the specific object to grasp and then place it into a particular container. The object can be placed anywhere with any orientation inside the container.




Appendix A

Neural Information Processing Systems

Q: For what purpose was the dataset created? Q: Who created the dataset (e.g., which team, research group) and on behalf of which entity (e.g., Q: Who funded the creation of the dataset? Q: What do the instances that comprise the dataset represent (e.g., documents, photos, people, Q: How many instances are there in total (of each type, if appropriate)? As shown in Table 1, the dataset statistics are as follows: Grounding Task: 111,770 samples for training, 21,616 samples for testing. For grounding, we use only one annotation per image.



Appendix A

Neural Information Processing Systems

Q: For what purpose was the dataset created? Q: Who created the dataset (e.g., which team, research group) and on behalf of which entity (e.g., Q: Who funded the creation of the dataset? Q: What do the instances that comprise the dataset represent (e.g., documents, photos, people, Q: How many instances are there in total (of each type, if appropriate)? As shown in Table 1, the dataset statistics are as follows: Grounding Task: 111,770 samples for training, 21,616 samples for testing. For grounding, we use only one annotation per image.




Delta Activations: A Representation for Finetuned Large Language Models

arXiv.org Artificial Intelligence

The success of powerful open source Large Language Models (LLMs) has enabled the community to create a vast collection of post-trained models adapted to specific tasks and domains. However, navigating and understanding these models remains challenging due to inconsistent metadata and unstructured repositories. We introduce Delta Activations, a method to represent finetuned models as vector embeddings by measuring shifts in their internal activations relative to a base model. This representation allows for effective clustering by domain and task, revealing structure in the model landscape. Delta Activations also demonstrate desirable properties: it is robust across finetuning settings and exhibits an additive property when finetuning datasets are mixed. In addition, we show that Delta Activations can embed tasks via few-shot finetuning, and further explore its use for model selection and merging. We hope Delta Activations can facilitate the practice of reusing publicly available models. Code is available at https://github.com/OscarXZQ/delta_activations.


MEKiT: Multi-source Heterogeneous Knowledge Injection Method via Instruction Tuning for Emotion-Cause Pair Extraction

arXiv.org Artificial Intelligence

Although large language models (LLMs) excel in text comprehension and generation, their performance on the Emotion-Cause Pair Extraction (ECPE) task, which requires reasoning ability, is often underperform smaller language model. The main reason is the lack of auxiliary knowledge, which limits LLMs' ability to effectively perceive emotions and reason causes. To address this issue, we propose a novel \textbf{M}ulti-source h\textbf{E}terogeneous \textbf{K}nowledge \textbf{i}njection me\textbf{T}hod, MEKiT, which integrates heterogeneous internal emotional knowledge and external causal knowledge. Specifically, for these two distinct aspects and structures of knowledge, we apply the approaches of incorporating instruction templates and mixing data for instruction-tuning, which respectively facilitate LLMs in more comprehensively identifying emotion and accurately reasoning causes. Experimental results demonstrate that MEKiT provides a more effective and adaptable solution for the ECPE task, exhibiting an absolute performance advantage over compared baselines and dramatically improving the performance of LLMs on the ECPE task.